Sector and Sphere: Towards Simplified Storage and Processing of Large Scale Distributed Data

نویسندگان

  • Yunhong Gu
  • Robert L. Grossman
چکیده

Cloud computing has demonstrated that processing very large datasets over commodity clusters can be done simply given the right programming model and infrastructure. In this paper, we describe the design and implementation of the Sector storage cloud and the Sphere compute cloud. In contrast to existing storage and compute clouds, Sector can manage data not only within a data center, but also across geographically distributed data centers. Similarly, the Sphere compute cloud supports User Defined Functions (UDF) over data both within a data center and across data centers. As a special case, MapReduce style programming can be implemented in Sphere by using a Map UDF followed by a Reduce UDF. We describe some experimental studies comparing Sector/Sphere and Hadoop using the Terasort Benchmark. In these studies, Sector is about twice as fast as Hadoop. Sector/Sphere is open source.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sector and Sphere: the design and implementation of a high-performance data cloud

Cloud computing has demonstrated that processing very large datasets over commodity clusters can be done simply, given the right programming model and infrastructure. In this paper, we describe the design and implementation of the Sector storage cloud and the Sphere compute cloud. By contrast with the existing storage and compute clouds, Sector can manage data not only within a data centre, but...

متن کامل

An Efficient Data Replication Strategy in Large-Scale Data Grid Environments Based on Availability and Popularity

The data grid technology, which uses the scale of the Internet to solve storage limitation for the huge amount of data, has become one of the hot research topics. Recently, data replication strategies have been widely employed in distributed environment to copy frequently accessed data in suitable sites. The primary purposes are shortening distance of file transmission and achieving files from ...

متن کامل

Security-Constrained Unit Commitment Considering Large-Scale Compressed Air Energy Storage (CAES) Integrated With Wind Power Generation

Environmental concerns and depletion of nonrenewable resources has made great interest towards renewable energy resources. Cleanness and high potential are factors that caused fast growth of wind energy. However, the stochastic nature of wind energy makes the presence of energy storage systems (ESS) in wind integrated power systems, inevitable. Due to capability of being used in large-scale sys...

متن کامل

Logistics performance of European Union markets: Towards the development of entrepreneurship in the transport and storage sector

The markets globalization is one of the factors creating conditions for the development of entrepreneurship. Entrepreneurship does not have one generally accepted definition. Most often, entrepreneurship is perceived as the ability to increase the number of enterprises. Entrepreneurship can be understood as the potential to identify and use development opportunities regardless of own resources....

متن کامل

E2DR: Energy Efficient Data Replication in Data Grid

Abstract— Data grids are an important branch of gird computing which provide mechanisms for the management of large volumes of distributed data. Energy efficiency has recently emerged as a hot topic in large distributed systems. The development of computing systems is traditionally focused on performance improvements driven by the demand of client's applications in scientific and business domai...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/0809.1181  شماره 

صفحات  -

تاریخ انتشار 2008